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SUMMARY 

We introduced a new approach to describe Java software as graph, where nodes represent 
a Java file - called compilation unit (CU) - and an edges represent a relations between 
them. The software system is characterized by the degree distribution of the graph 
properties, like in-or-out links, as well as by the distribution of Chidamber and Kemerer 
metrics computed on its CUs. Every CU can be related to one or more bugs during its 
life. We find a relationship among the software system and the bugs hitting its nodes. 
We found that the distribution of some metrics, and the number of bugs per CU, exhibit 
a power-law behavior in their tails, as well as the number of CUs influenced by a specific 
bug. We examine the evolution of software metrics across different releases to understand 
how relationships among CUs metrics and CUs faultness change with time. 

KEYWORDS: Software graphs, object-oriented programming, statistical methods, complexity measures, 
software metrics, hug distribution. 



1. INTRODUCTION 

Large software systems can be analysed as graphs so huge and intricate that can be studied 
using complex network theory. 

In the case of object oriented (00) software systems nodes are the classes or the interfaces, and 
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oriented edges are the various kinds of relationships between them, inheritance, composition, 
dependence. For 00 systems there exist also some consolidated software metrics, also 
associated to the graph, usually computed at class level, the most used being the Chidamber 
and Kemerer (CK) suite of metrics pi . The relationship between metrics and software quality 
is fuzzy, and is still the subject of ongoing research. 

Related to software quality are software bugs. Several researchers analysed software evolution 
in order to understand the relationship between software management and bug issues. 
Purushothaman et al. ^ analyzed software development process to identify what are the 
relationships between small changes to the code and bug growth. Kim et al. [3J analyzed 
micro-pattern evolution in Java classes to identify which of them is more bug-prone. Sliwerski 
et al. [4] analyzed the fix-inducing changes, i.e. software updates that trigger the appearance of 
bugs. In their work, the revision history associated to compilation units (CUs) was examined 
to understand where bugs issues are introduced during CU evolution. Compilation units, the 
basic blocks examined in this paper, are files containing one or more classes, for which it is 
possible to compute software metrics similar to those used for classes. 

A complete analysis of the relationships between graph properties of large software systems, 
statistic of software metrics, and the introduction and distribution of bugs in such graphs 
is, to our knowledge, completely missing. Zimmerman et al. considered a network analysis 
on dependences graphs, built on binary files [5], and how dependencies correlate with, and 
predict, defects. Andersson et al. [6] discussed the Pareto distribution of bugs in classes, 
without entering into the details of the statistical properties of software which determine such 
distribution. Zhang found that the bug distribution across compilation packages in Eclipse 
Java system seems to follow a Weibull distribution [7]. 

The aim of this paper is study 00 systems using complex network theory, to improve the 
knowledge of bugs causes and to statistically determine their distribution into the system. We 
extend the definitions of CK software metrics to CUs to understand the evolution of faultncss, 
i.e. how a metric variation affects the number of bugs hitting a CU. A deeper understanding of 
the dynamics of software development could be useful for software engineers to identify which 
system components will be more prone to bugs, thus focusing testing and code reviews on 
these components. 

We also study the time evolution of software systems and of the related graphs and metrics, 
analysing both the source code and the bugs of various releases of two large Java systems. 
Eclipse ^ and Netbeans . For each release we computed the associated software graph and 
the CK metrics for each class. Furthermore, we study the number of defects associated to CUs, 
as found in the bug-tracking system used for development. 

We computed the correlation between 00 metrics and bugs and analyzed the evolution of 
these metrics between one release and the next, correlating metrics changes with the number 
of defects. We present a scheme of classification of CUs into categories which allows us to 
identify which parts of the software are the most fault-prone, and how these are correlated to 
CK software metrics. We support our findings with significance tests. 
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2. Method 

We analyze the source code of object-oriented systems written in Java. Both use CVS as version 
control system. Eclipse uses Bugzilla as issue tracker system, while Netbeans uses Issuezilla. 
The CVS keeps track of the source code history, Bugzilla and Issuezilla keep track of the bugs 
history. 

2.1. Software graph and OO metrics 

An oriented graph is associated to 00 software systems, where the nodes are the classes 
and the interfaces, and the edges are the relationships between classes, namely inheritance, 
composition and dependence. 

The number and orientation of edges allow to study the coupling between nodes. In this graph 
the in-degree of a class is the number of edges directed toward the class, and measures how 
much this class is used by other classes of the system. The out-degree of a class is the number 
of edges leaving the class, and represents the level of usage the class makes of other classes in 
the system. In this context CK suite is a common metrics employed in classes analysis. We 
calculated for each node the values of the four most relevant CK metrics of the associated 
class: 

• Weighted Methods per Class (WMC). A weighted sum of all the methods defined in a 
class. We set the weighting factor to one to simplify our analysis. 

• Coupling Between Objects (CBO). The counting of the number of classes which a given 

class is coupled to. 

• Response For a Class (RFC). The sum of the number of methods defined in the class, and 
the cardinality of the set of methods called by them and belonging to external classes. 

• Lack of Cohesion of Methods (LCOM). The difference between the number of non 
cohesive method pairs and the number of cohesive pairs. 

Wc also computed the lines of code of the class (LOC), excluding blanks and comment lines. 
This is useful to keep track of CU dimension because it is known that a "long" class is more 
difficult to menage than a short class. 

Every system class resides inside a Java file, called CU. While most files include just one 
class, there are files including more than one class. In Eclipse 10% of CUs host more than one 
class, whereas in Netbeans this percentage is 30%. In commit messages issues and issue fixing 
always refer to CUs. To make consistent issue tracking with source code, we decided to extend 
CK metrics from classes to CUs. CUs represent therefore the main element of our study. So, 
we defined a CU graph whose nodes are the CUs of the system. Two nodes are connected 
with a directed edge if at least one class inside the CU associated with the first node has a 
dependency relationship with one class inside the CU associated with the second node. We 
refer to this graph for computing in-links and out-links of a CU-node. We reinterpreted CK 
metrics onto this CU-graph: 

• CU LOCS is the sum of the LOCS of classes contained in the CU; 
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• CU CBO is the number of out-links of each node, excluding those representing 
inheritance. This definition is consistent with that of CBO metrics for classes; 

• CU LCOM and CU WMC are the sum of LCOM and WMC metrics of the classes 
contained in the CU, respectively; 

• CU RFC is the sum of weighted out-links of each node, each out-link being multiplied 
by the number of specific distinct relationships between classes belonging to the CUs 
connected to the related edge. 

For each CU we have thus a set of 6 metrics: In-links, Out-links, CU-LOCS, CU-LCOM, 
CU-WMC, CU-RFC and CU-CBO. This was made for all versions of Echpse and Netbeans. 

2.2. Bug extraction and metric 

Onto the CU graph we look for nodes hit by Issues. To obtain this information it is necessary 
to check the CVS log file, and the data contained in the ITS. 

We consider a CU as affected by an Issue when it is modified for issue fixing. Developers record 
on the CVS log all fixing activities. All commit operations are tracked in the CVS log as single 
entries. Each entry contains various data, among which the date, the developer who made the 
changes, an annotation referring to the reasons of the commit, and the list of CUs interested 
by the commit. In case of commits associated to an issue fixing activity, this is written in the 
annotation, though not in a standardized way. It is not simple to obtain a correct mapping 
between issue(s) and the related CU(s) [4] [lO] . 

In our approach, we first analyzed the CVS log, to locate commit messages associated to 
fixing activities. Then, the extracted data are matched with information found in the ITS. 
Each issue is identified by a whole positive number (ID). In commit messages it can appear a 
string such as "Fixed 141181" or "bug #141181", but sometimes only the ID is reported. Every 
positive integer number is a potential issue. To discern among issues and simple numbers we 
applied the following strategies: 

1. we considered only positive integer numbers present in the issue tracker as valid issue 
IDs related to the same release; 

2. we did not consider some numeric intervals particularly prone to be a false positive issue 
ID. 

The latter condition is not particularly restrictive in our study, because we do not consider 
the first releases of the studied projects, where issues with "low" ID appear. 
All IDs not filtered out are considered issues and associated to the addition or modification of 
one ore more CUs, as reported in the commit logs. The total number of issues hitting a CU in 
each release constitutes the issue metric we consider in this study. Note that an issue reported 
in an issue management system has a broad sense. It may denote an error in the code, but also 
an enhancement of the system, or a features request, or fixing a requirement error. Moreover, 
when many CUs are affected by a single bug, it is possible that some of them are modified not 
because they have the issue, but as a side-effect of modifications made in other CUs. 
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Table I: Number of CUs of Eclipse for each main release 



Release 


2.1 


3.0 


3.1 


3.2 


3.3 


Number of CU 


7885 


10584 


12174 


13221 


14564 



3. Results 

The subjects of our study were Eclipse and Netbeans projects, both open source, object 
oriented, Java based systems. Table U and |TT] show the number of CUs involved in the main 
releases of Eclipse and Netbeans, respectively. 



Table II: Number of CUs of Netbeans for each main release 



Release 


3.2 


3.3 


3.5 


3.6 


4.0 


5.0 


6.0 


Number of CU 


3350 


4421 


7391 


8350 


9365 


12137 


37145 



A software system usually evolves through subsequent releases . Main releases entail 
substantial enhancements of the system, and are usually characterized by significant changes 
in software sizes, as demonstrated by the data reported in Tables HI and HIl Between two main 
releases there may be different "patching releases" , intended to fix bugs and to provide minor 
enhancements. Even if we analyzed all the releases, we report results for the main releases and 
the patching release immediately preceding the next main release. In fact most of bugs are 
introduced in upgrading from the last patching release to the next main release. 

3.1. Statistical analysis 

We computed the statistical distributions of software metrics underlying the software graph. 
We compared the metrics for software graphs built using classes as basic units, already observed 
in literature, with the ones obtained in this work for software graphs built considering CUs. 
The latter distributions substantially keep the "fat-tail" behavior of the corresponding class 
metrics [TT] in all cases. Fig. [T] reports the log-log plot of the complementary cumulative 
distribution functions (CCDF) of CBO metric of Eclipse 3.2 for classes and for CUs. 

Fig. [2] reports the CCDF of CBO metrics, this time referred to Netbeans 4.0. All these 
distributions exhibit a power-law behavior in their tail. 

We recall that a quantity x obeys a power law if it is drawn from a probability distribution 
proportional to a negative power of x: 

p{x) oc x~'' where 7 > 0. (1) 

7 is the power-law coefficient, known also as the exponent or scaling parameter. The 
corresponding complementary cumulative distribution function (CCDF), i.e. the probability 
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Figure 1: The CCDF of CBO metrics Figure 2: The CCDF of CBO metrics 

for classes (crosses) and CUs (stars) for classes (crosses) and CUs (stars) 

in Eclipse 3.2 in Netbeans 4.0. 



that the random variable is greater than a given value x, is: 

P{X > x)(x a;-(T-i) (2) 

A power-law, or Pareto, distribution cannot hold for a; = 0, so eligible values of x must be 
greater than a positive number Xmin- This characteristic allows to consider distributions that 
are power-laws only in their right "tail", that is for x greater than a given value Xmin, and 
not for lower values of x. All the distributions shown in Figs. [1] [3] and 2] show a straight line 
behavior in their right tail. Note that the CCDF has the same analytical expression of the 
distribution function, with a negative exponent offset by one. Plotting p{x) or P{x) in log-log 
scale one obtains a straight line, as shown in Figs. [1] and [21 

Fig. [3] and [4] show the CCDF of WMC metric in Eclipse 3.2 and in Netbeans 4.0, respectively. 
These distributions are also quite similar, and present again in their tail a power-law behavior, 
both for classes and for CUs. We found this behavior also for all other releases, and for all 
metrics. 

The finding that the distributions of CU metrics largely coincide with those of the 
corresponding metrics of classes suggests that the same considerations that are valid for CUs 
may be extended also to classes, even in the cases where data for the classes are not directly 
accessible, like in our case for bugs. One goal of this paper is, in fact, to find, by means of 
the software graph framework, existing correlations among bugs and metrics. Thus, since bug 
information for classes is not directly detectable from the repository, we analyzed the bugs 
metric only for CUs, and use this information to obtain clues about classes. 
Fig. El shows the CCDF of the number of bugs per CU in Eclipse 3.2. Fig. [6] shows the same 
distribution in Netbeans 3.4. The meaning of these power-law tail distributions is unequivocal. 
While most CUs present only very few bugs, there is a non-negligible number of CUs with 
very many bugs. We also found similar shapes (patterns) in all other main releases. 
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'WMC Compilation Units +WMC Classes 



'WMC Compilation Units +WMC Classes 





Figure 3: The CCDF of WMC metrics 
for classes (crosses) and CUs (stars) in 
Eclipse 3.2. 

bugs count 



Figure 4: The CCDF of WMC metrics 
for classes (crosses) and CUs (stars) in 
Netbeans 4.0. 



Figure 5: The CCDF of the number 
of bugs per CU in Eclipse 3.2. 



Figure 6: The CCDF of the number 
of bugs per CU in Netbeans 3.4. 
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On the basis of these similarities, the hypothesis that the power-laws existing for bug 
distribution among CUs may be extended to classes, as well as to other units, like modules or 
packages, and that it is a property of the graph structure of the system looks sensible. 

In fact similar results were obtained by Andersson et Runeson [6], and by Zhang [7]- 
Andersson et Runeson suggest a Pareto law governing the distribution of bugs across basic 
units of a software system only partially 00, showing that few modules contain most of the 
bugs (the 20-80 rule [E])- Zhang re-examined their results for the Eclipse software system, 
finding that a Weibull distribution fits data better than a power-law, studying packages instead 
of modules. Since the tail of a Weibull distribution is often not distinguishable from a power- 
law tail, their results support our hypothesis. 

Let us point out what we consider our most relevant finding. We verified that a power-law 
distribution may be appropriate to describe the fat-tail distribution of different quantities. 
Note that the fat-tail contains the software units to which most of the information belongs. 
When a metric is distributed according to a power-law, even only in its tail, with a scaling 
exponent small enough, there are relatively few units with highest values of the metrics, where 
criticality resides, while most other units are much less critical. The 80-20 Pareto principle is 
a consequence of that: about 80% of the criticality is held in 20% of all units. 
Our analysis is finer than those performed in [B] or in [7 , in the sense that we analyzed the 
software structure and relationships at the level of compilation units, one level deeper than 
the module or the package level presented in the above works. This allowed us to recover 
finer information on the distributions of metrics, especially in their tail. Our results confirm 
those of Andersson and Runeson, and of Zhang, showing that the same framework holds at 
different scales, exhibiting a scale-free structure [T3|. This finding qualitatively supports the 
use of power-laws. Finally, also Louridas et al. |14j . show a large variety of cases in which 
power-laws well account for the distribution of different software properties. 
Regarding the value of the exponent 7 and the corresponding behavior of the number of bugs 
per CU, this value tends to be between 2.5 and 3.5 in the various releases examined for both 
Eclipse and Netbeans. 

According to ref. [M], a mathematical description of the fat-tail may have relevant 
consequences on software engineering, for example in helping to carefully select which parts of 
the software project are worth of more care and effort, also from an economical point of view. 
For instance, given n modules characterized by a metric distributed according to a power-law 
with exponent 7, the average maximum expected value for this metric in the module with 
highest metric value, < xmax >, is given by the formula [T5] 

This formula provides a definite expectation of the maximum value taken by the metric, and 
hence allows to flag specific modules with metric value of this order of magnitude. 
We studied also the distributions of the number of CUs hit by a single bug, the dual of the 
distribution of bugs across CUs. Also in this case, we find a power-law, as shown in Figs. [7| 
and[8]for Eclipse and Netbeans, respectively. This means that, while most bugs affect just one 
or a few CUs, there are bugs that affect tens, or ever hundreds of CUs. 
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Figure 7: The CCDF of the number of Figure 8: The CCDF of the number 

CUs associated to each bug in Eclipse of CUs associated to each bug in 

3.2. Netbeans 3.4. 



The value of the exponent 7 of the distributions of the number of CUs affected by a bug is 
consistently between 2.2 and 2.9 in all considered releases, for both Eclipse and Netbeans, 
meaning an ever "fatter" tail of this distribution with respect to the previously studied 
distribution of bugs per CU. 

The finding that the distribution of bugs across CUs satisfies a power-law, may suggest a 
model for the introduction and the spread of bugs in the software system. We already specified 
that, in our investigation, we name "bug" each numerical identifier found in the repository 
associated to software "fixing" . Thus, generally speaking, a bug reported in a CU means 
that such a CU needed to be partially modified owing to this bug. Now, let us consider the 
graph structure of the software system. We, and many other authors in literature, verified an 
organized structure of such graphs, exibiting power-law distributions for many properties of 
the system. In particular, there are nodes linked with many other nodes, playing the role of 
"hubs" of the system. For example, there are few CUs with a large number of in-links, meaning 
that they are extensively used by other CUs. If a bug hits such CUs, namely, the CU code 
need modifications, it is very likely that also the code of CUs linked to that node need to be 
modified. Such mechanism may generate a sort of defect propagation in the software graph, 
very similar to the spread of a contagious disease. The system gets infected by bugs, and a 
single bug may affect many different CUs, if it propagates from a hub node. On the contrary, 
bugs in CUs with very few links will likely remain confined to a small number of CUs. 
Our heuristic conclusion is that the power-laws observed for the bug distribution is probably 
due to the scale-free structure of the software graph. Bugs propagate inside a constraining 
framework, which determines their diffusion across the software system. 

From the software engineering point of view, the usefulness of finding power-laws in the tail of 
the bugs distribution, may be illustrated following the reasoning of Louridas et al. [14] . Once 
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it is shown that bugs distribution across CUs is in the form of a power-law, CUs in the tail 
may be identified as the most fault-prone. Thus, after the issue of a new release, the inspection 
of CUs for bug detection may take advantage of this information. For instance, an inspection 
of the highest 5 % ranked CUs would imply the inspection of a high percentage of bugs, were 
the exact percentages is related to the power-law exponent. 

3.2. Correlation 

We analyzed, for each version of the system, the correlations between the considered software 
metrics and the number of bugs. This information may be used to understand, from the 
measure of the metric, which parts of the software are most affected by faults, and to devise 
the possible strategies to apply during software development in order to control metrics values, 
with the goal of reducing bug introduction. 

Our analysis started computing, for various releases Ri of the system, the linear correlation 
between a particular CK metric and the number of bugs of the same CUs. This is only a 
preliminary analysis in order to identify which CU metrics are more related to fault proneness. 
We recall that developers distinguish between "main" and "patching" releases, and that 
changes from a main release to the next are usually relevant also regarding metrics. 
In the first part of our study we referred to the main releases. In the Eclipse project main 
releases are identified by two-digit numbers, that is: Eclipse 2.1, Eclipse 3.0, Eclipse 3.1, Echpse 
3.2, and Eclipse 3.3. We analyzed what can be deduced about bugs from the analysis of the 
software metrics for this kind of releases. 

Table Hm shows the correlations between metrics and bugs for the main releases of Eclipse. The 
metrics showing the highest correlation with bugs are those taking into account the number of 
dependencies with other CUs, namely CBO and RFC. This fact highlights the importance of 
an analysis of a software system as a graph. The out-links metric is less correlated with bugs 
than CBO and RFC. Out-links metric includes not only dependency relationships, but also 
inheritance and implements relationships. A lower correlation of this metric with bugs may be 
interpreted with a higher ability of dependency relationships of propagating bugs with respect 
to the other relationships. 



Table III: Pearson correlations between metrics and bugs for some releases of Eclipse: 





2.1 


3.0 


3.1 


3.2 


3.3 


bugs-LOCS 


0.49 


0.57 


0.54 


0.58 


0.48 


bugs-CBO 


0.55 


0.53 


0.55 


0.55 


0.42 


bugs-RFC 


0.59 


0.48 


0.44 


0.56 


0.45 


bugs-WMC 


0.48 


0.45 


0.38 


0.48 


0.40 


bugs-LCOM 


0.30 


0.21 


0.15 


0.34 


0.24 


bugs-inliks 


0.1 


0.17 


0.25 


0.28 


0.24 


bugs-outlinks 


0.47 


0.38 


0.40 


0.55 


0.42 
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The low correlation of the in-links metric with bugs indicates that it is important to take 
into account not only the number of links but also their direction. An out-link directed from 
a compilation unit A to a compilation unit B may be considered like a channel easing the 
propagation of defects from B to A, but not vice-versa. 

Another metric that is well correlated with bugs is LOGS metric. This result can be clearly 
understood considering that LOGS metric is well correlated with CBO, RFC and out-links 
metrics. Moreover, the larger the GU, the higher the probability of being hit by some bugs. 
The LGOM metric is calculated taking into account the internal structure of a compilation 
unit, and not the relationships with other CUs. The low correlation between LCOM and bugs 
suggests that the fault proncness of a CU is not overly influenced by the lack of cohesion of 
the classes contained in the GU. This confirms once again the relevance of the information 
provided by an analysis of the software system viewed as a graph. The data show that, while 
there are metrics more or less correlated with the number of bugs, the correlations are never 
very strong. This is sensible, since a perfectly linear correlation would imply, for example, a 
doubling of the introduced bugs with the doubling of the metric, and this never occurs in 
reality. 



Table IV: Pearson correlations between metrics and bugs for all releases of Netbeans 





3.2 


3.3 


3.5 


3.6 


4.0 


5.0 


6.0 


bugs-LOCS 


0.34 


0.55 


0.42 


0.4 


0.36 


0.34 


0.35 


bugs-GBO 


0.25 


0.44 


0.37 


0.36 


0.27 


0.28 


0.25 


bugs-RFG 


0.38 


0.57 


0.44 


0.39 


0.33 


0.31 


0.28 


bugs-WMC 


0.38 


0.53 


0.38 


0.35 


0.31 


0.27 


0.23 


bugs-LGOM 


0.32 


0.44 


0.23 


0.13 


0.10 


0.08 


0.04 


bugs-inliks 


0.10 


0.16 


0.19 


0.12 


0.05 


0.07 


0.07 


bugs-outlinks 


0.24 


0.40 


0.35 


0.33 


0.24 


0.25 


0.23 



In Tabic IIVI we report the correlation between a metric and the number of bugs of the GUs 
for various releases Ri of the Netbeans system. In Netbeans the distinction between main 
and patching releases is fuzzier than in Eclipse; moreover there are various MR which are not 
followed by classic PR. 

A comparison of Tables Hill and HVl shows that Netbeans correlation values among metrics and 
bugs number are usually lower than in Eclipse. However, in both systems, LOGS and RFC 
are the two most correlated metrics to the GU faultness, while LGOM shows, in both cases, a 
weak correlation to CU faultness. 
These results show that: 

• Given a release, there exist metrics that arc more correlated to CU faultness than others; 

• Considering all releases, there is not one GK metric which is the most correlated for each 
release; 

• Given a metric, its correlation with the number of bug changes release by release. 
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Note, however, that aU correlation coefficients shown in Table IIIII and IIVI are positive, so 
all the considered metrics are, more or less, positively correlated with bugs. This is consistent 
with the observation that all CK metrics and the size of the code are a measure of complexity, 
and therefore should in general be kept low. 

3.3. Analysis of software evolution 

We also analyzed the evolution of the metrics between two consecutive releases. To this purpose 
we define different types of CUs, distinguishing among updated, unmodified, newly introduced, 
and defining all these types with respect to all the different metrics. 

In particular, given a release Ri, the next release and a metric M, we classified the 

compilation units in four categories: 

• CU.X is the set of compilation units where metric M doesn't change between Ri and 
-Ri+i; 

• CU.U is the set of compilation units where metric M changes (Updated); 

• CU.A is the set of compilation units that exist in but not in Ri (Added); 

It must be pointed out that U and X categories are defined relative to a specific metric. A CU 
might exhibit a change in metric M but not in metric M' between the releases Ri and Ri+i. 
Thus, it will belong to class CU.U for M, and to class CU.X for M'. This case is not common, 
but it is definitely possible. CU.A is defined regardless to any metric M, since it refers to CUs 
just introduced in the new release. There are also CUs existing in release Ri but not in release 
Ri+i- These deleted CUs are not considered in our study. 

Given the set of compilation units belonging to the three categories CU.U, CU.X, and CU.A, 
we compute: 

• the fraction of compilation unit affected by bugs, which provides an infection probability; 

• the average number of bugs of the infected compilation units. 

In Table |V] we show the probability for CUs belonging to one of the families U, X and A, of 
being infected, in various changes of releases. 

The probability that a CU belonging to family CU.U is infected is between 0.6 - 0.7 in 
Eclipse. This means that there is a high probability that changing the LOCS, CBO, or LCOM 
metrics of a CU from one release to the next results in injecting at least one error into the 
compilation unit. This result confirms Purushothaman's study |2J, that highlighted that code 
correction for defects often introduces new defects. Also the CUs added to the system, in the 
transition from Ri to Ri+i, show a high probability to be infected, clearly larger than for 
the case of CUs not modified (set CU.X), and slightly smaller than for the set CU.U. Similar 
results were obtained also for all other metrics. 

On the contrary, if the metric does not change there is a low probability that a CU is affected 
by bugs. These bugs clearly refer to bugs already present in Ri but that were found only when 
checking Ri+i release. 

In order to support our findings about the deep differences among CU.U, CU.X and CU.A 
families, we performed chi-squarc significance tests. We formulate the following null hypothesis: 
"the subdivision of CU in U, X and A does not significantly influence the number of infected 
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Table V: Percentage of bug-affected CUs between two consecutive releases (shown in the top 
row), for different families, relative to different metrics in Eclipse 





Subsequent releases 


Metric 


Set 


2.1.3-3.0 


3.0.2-3.1 


3.2.2-3.3 


LOG 


CU.U 


0.66 


0.61 


0.62 


CU.X 


0.15 


0.17 


0.1 


CBO 


CU.U 


0.6 


0.7 


0.68 


CU.X 


0.2 


0.27 


0.18 


LCOM 


CU.U 


0.7 


0.69 


0.66 


CU.X 


0.22 


0.23 


0.16 


CU.A 


0.51 


0.55 


0.58 



CU". 

We verified that all values have a confidence level larger than 99.9 percent (the confidence 
level is actually much larger). Therefore we can reject the null hypothesis with a probability 
greater than 99.9%, and confirm that our classification of CUs into families provides 
significative correlations with the presence of bugs. 

In Table IVII we report the average number of bugs of the infected CUs. These data 
confirm that the CUs infected of type U and A have an average number of bugs larger 
than the compilation units of type X. Note also that, on average, more than one bug is 
found during a release lifespan even in the CUs that are not changed in the release. Thus, in 

Table VI: Average number of bug-affected CUs between two consecutive releases (shown in 
the top row), for different families, relative to different metrics in Eclipse 





Subsequent releases 


Metric 


Set 


2.1.3-3.0 


3.0.2-3.1 


3.2.2-3.3 


LOC 


CU.U 


4.02 


3.16 


2.61 


CU.X 


1.38 


1.22 


1.29 


CBO 


CU.U 


3.92 


3.88 


3.03 


CU.X 


2.36 


1.86 


1.8 


LCOM 


CU.U 


4.34 


3.58 


2.95 


CU.X 


2 


2.64 


1.66 


CU.A 


3.2 


2.73 


2.51 



general, irrespectively of the metric, we have: 

• CU.U infection probability is around 60-70% 

• CU.A infection probability is around 50-60% 

• CU.X infection probability is around 10-30% 
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CU.U are the most faultprone, followed by CU.A. The mean number of bug is in agreement 
with these results, and varies between: 

• 2.5 and 4 for CU.U; 

• 2.5 and 3.5 for CU.A; 

• 1.6 and 2.5 for CU.X. 

In Table IVIII we show the results for Netbeans. In Netbeans there are less PRs, thus we 
consider jump of releases between couples of MRs. As in Eclipse, also in Netbeans LOC, CBO 
or LCOM variations determine a major introduction of bugs into the system, whereas the 
addition of new CUs determines a slightly lower rate of bug injection. Similar results were 
obtained also for the other metrics. 

Table VII: Percentage of infected CUs between two consecutive releases (shown in the top 
row), for different families relative to different metrics in Netbeans. 





Subsequent releases 


Metric 


Set 


3.1-3.2 


3.2.1-3.3 


3.6-4.0 


4.1-5.0 


LOC 


CU.U 


0.68 


0.67 


0.62 


0.61 


CU.X 


0.19 


0.15 


0.07 


0.06 


CBO 


CU.U 


0.51 


0.57 


0.59 


0.67 


CU.X 


0.27 


0.31 


0.18 


0.15 


LCOM 


CU.U 


0.53 


0.58 


0.67 


0.6 


CU.X 


0.23 


0.21 


0.12 


0.11 


CU.A 


0.47 


0.28 


0.36 


0.39 



The chi-square significativity test, about the classification in families for the Netbeans 
projects, performed using the same null hypothesis used for Eclipse yielded again confidence 
levels higher than 99.9 percent. 

Table IVnT] shows bug mean values for different CUs families. Again, updated and added CUs 
show higher mean values than unchanged CUs. In this case, CU.U and CU.A show values 
closer than in Eclipse. Also in Netbeans, on average, more than one bug is found during a 
release lifespan even in the CUs that are not changed. 
Summarizing these results, we found that: 

• the most infected CUs, in both projects, are updated CUs; infection probabilities values 
are almost 70% in both systems; 

• CUs belonging to CU.A set exhibits in general a slightly smaller infection probability 
than CU.U set; 

• CUs belonging to CU.X set are much less infected than CUs belonging to CU.A, and 
never exceed 30% probability to be hit by a bug; 

• usually, updated CUs have more bugs than others; this is always true in Eclipse, whereas 
it is almost always true in Netbeans; 
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Table VIII: Average number of bugs of the infected CUs relative to different metrics in 
Netbeans. 





Subsequent releases 


Metric 


Set 


3.1-3.2 


3.2.1-3.3 


3.6-4.0 


4.1-5.0 


LOG 


CU.U 


3.52 


3.66 


2.69 


2.62 


cu.x 


1.51 


1.44 


1.29 


1.24 


CBO 


CU.U 


3.36 


3.75 


3.65 


3.87 


cu.x 


2.14 


2.15 


1.89 


1.85 


LCOM 


CU.U 


3.28 


3.53 


3.01 


3.14 


cu.x 


1.92 


1.64 


1.58 


1.53 


CU.A 


2.62 


3.08 


3.92 


3.35 



• In Eclipse, the mean number of bugs of CU.U sets is often higher than in Netbeans, 
whereas the opposite holds for CU.A set. 

One of the main differences between Eclipse and Netbeans projects is the clear subdivision 
between patching release and main release. In Eclipse it is simple to verify that each main 
release X.O is always followed by patching releases, of type X.0.1, X.0.2, and so on. This 
distinction is weaker in Netbeans, and this seems to affect the variation of its statistics. 
For the family of compilation units U (CU.U), we calculated the correlation between the 
fractional change of some metrics, passing from Ri to Ri+i releases, and the number of bugs 
in Ri+i. We were interested in determining if and how the growth of a metric is possibly 
associated to an increase in the number of bugs. 

In Tables HXl and 1x1 we report this correlation for Eclipse and Netbeans projects. 

Table IX: Pearson correlation between metric changes and number of defect in the subsequent 
release in Eclipse. 





Subsequent releases 


Metric variation 


2.1.3-3.0 


3.0.2-3.1 


3.2.2-3.3 


ACBO-bugs 


0.37 


0.58 


0.49 


ALOCS-bugs 


0.29 


0.64 


0.53 


ARFC-bugs 


0.39 


0.56 


0.51 


ALCOM-bugs 


0.33 


0.32 


0.49 



All data in Tables llXl and Ix } show positive correlations. Correlation values are quite similar 
for the same pair of subsequent releases, whereas they show larger fluctuations for different 
metrics. A comparison of Tables IIXI and 1X1 shows that correlation values in Netbeans are often 
lower than in Eclipse. This result can be partially due to the less clear subdivision between 
main and patching releases in Netbeans project. 
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Table X: Pearson correlation between metric changes and number of defect in the subsequent 
release in Netbeans 





Subsequent releases 


Metric variation 


3.1-3.2 


3.2.1-3.3 


3.6-4.0 


4.1-5.0 


ACBO-bugs 


0.18 


0.39 


0.19 


0.55 


ALOCS-bugs 


0.30 


0.61 


0.55 


0.59 


ARFC-bugs 


0.37 


0.60 


0.56 


0.59 


ALCOM-bugs 


0.28 


0.68 


0.45 


0.33 



According to tables fVl I Vlli IIXI and 1x1 bug introduction is mainly due to updating and adding 
CUs. This is valid for each metric considered. 



4. Conclusion 

A statistical description of large software systems as directed graphs can provide much 
additional information on the system features with respect to more traditional approaches, 
from the software engineering perspective. Adopting a graph as a model for the software 
system, we used the compilation units as the basic software module in order to build a software 
graph, and redefined the CK suite of metrics to cope with CUs. These metrics were then used 
to investigate, with a statistical analysis, how and where bugs were introduced into two big, 
00 software projects like Eclipse and Netbeans. We wrote two different parsers to analyze the 
CVS log file and the issue tracker repositories in order to automatically associate bugs and 
CUs. In this paper, we introduced the concept of compilation unit graph, and of 00 metrics 
related to compilation units, with the purpose of analyzing software projects managed using 
a configuration management system and a corresponding bug tracking system. 
The picture of the software system as a graph allowed us to detect fat-tail distributions, well 
described by power-laws, for different features of the system, suggesting the same general 
underlying framework of many other complex networks. In particular, we found that bugs 
distribution among CUs, number of CUs affected by bugs, metrics distributions (namely LOCs, 
number of in-links and out-links of the class graph, CK metrics WMC, CBO, RFC and LCOM), 
all exhibit power-laws fat-tails. 

Inside this framework it is possible to identify strong correlations among bugs and those metrics 
related to the number of external dependencies which, in the graph representation, are easily 
described as directed links. All these findings together indicate a possible strategy to optimize 
resources and efforts in software engineering for finding, forecasting, and fixing software defects. 
Once the software graph reveals the fat-tail in the relationships between bug and CUs, one 
may identify which parts of the software are the most fault-prone and focus fixing efforts on 
them. Following 14J, if one ranks CUs according to these power-laws, the review of a small 
fraction among the highest ranked may have an exponential impact on the overall amount of 
software defects detectable and fixable. 
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Our analysis goes one step further, examining software evolution across many releases. This 
study identifies how metric evolution is related to bug introduction. The change of some 
particular metrics may result in a higher probability of introducing a bug. In particular, we 
identify different families of CUs, related to CK metrics, showing different robustness with 
respect to bug affection. Our categorization into families is related to the software evolution 
and it is useful to investigate correlations among bugs and software metrics changes. This 
classification may be particularly useful to software engineers in order to decide onto which 
parts of a big software project it is better to concentrate efforts and resources. 
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